Chris Pollett > Old Classses >
CS255

( Print View )

Student Corner:
  [Grades Sec1]

  [Submit Sec1]

  [
Lecture Notes]

  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HWs and Quizzes:
  [Hw1]  [Hw2]  [Hw3]
  [Hw4]  [Hw5]  [Quizzes]

Practice Exams:
  [Mid]  [Final]

                           












HW#2 --- last modified February 06 2019 04:20:23..

Solution set.

Due date: Mar 9

Files to be submitted:
  Hw2.zip

Purpose: To gain experience with coding and analyzing parallel algorithms.

Related Course Outcomes:

The main course outcomes covered by this assignment are:

LO2 -- Analyze or a code a parallel algorithm using Java Threads

LO3 -- Analyze or code a parallel algorithm using a library such as OpenCL

Specification:

This homework will consist of two parts, a written part and a coding part. Both parts will be submitted in the Hw2.zip file. The written part should be in a file Hw2.pdf. For the written part of the homework you should write solutions f or the following questions. In what you turn in, make sure to write the names and student ids for each group member. For each problem, first copy and paste the question, then write your solution to it beneath it.

  1. Do problem 27.1-3 out of CLRS.
  2. Do problem 27.1-8 out of CLRS.
  3. Do problem 27.2-1 out of CLRS.
  4. Given an `n times m` matrix `A = (a_(ij))`, let `A^t = (a_(ji))` denote its transpose. The matrix `A^t A` and `A\A^t` are both symmetric square matrices that play an important role in a variety of fields like image compression, text summarization, and so on. Design a multithreaded algorithm which given `A` with `m < n` computes `A^t A` with work `Theta(n^3)` and and span `O(log^2 n)`.
  5. Suppose we have a list of items 1 to n each of which has an priority a positive integer priority and a boolean label safe (imagine family friendly search). Devise a CREW PRAM algorithm that selects the top `k` largest of input numbers whose safe property is true in `O(log n)` steps using `n/log n` processors. Assume that the items are initially arranged in a global array of structs in locations `1` through `n`.

For the coding part of this homework I would like you to write two parallel programs in Java: ThreadNorm.java and JoclNorm.java. The first program makes use of Java Threads and the second makes use of JOCL jar file that provides Java bindings to OpenCL. Both of these programs should compute the norm of a vector of integers that comes from a file. I would like you to code both of your programs so that their spans are `O(log n)`. ThreadNorm.java will be compiled from the command line via:

javac JoclNorm.java

It can use either classic Java Threads or the Java 7 Fork Join/Parallel Array frameworks in java.util.concurrent. We didn't talk about the latter so you'll be own your own to learn about if you want to use those. To run your program I will then type:

java ThreadNorm filename_with_vector_data

On this input, your program should read in the contents filename_with_vector_data, which should consist of lines with one integer followed by a new line character/line, make a vector `vec x` from these, and compute the norm `||vec x|| = sqrt(sum_i (x_i)^2)`. Finally, it should output this value followed by a new line. For example, I might have the file my_vector.txt with contents:

1
2
3
10

This corresponds to the vector `(1, 2, 3, 10)` whose norm is `sqrt(1^2 + 2^2 + 3^2 +10^2) = sqrt(114)`. So after running your program, it should output:

114

on a line by itself. Your JoclNorm.java program should do exactly the same thing but use JOCL rather than Java Threads. I.e., I'll compile it with

javac JoclNorm.java

You can assume I have set up the classpath to find the JoCL jar file. Then I'll run your program with a line like:

java JoclNorm filename_with_vector_data

Your JOCL program will probably need to make multiple calls from the CPU to the GPU. For example, one call to compute the squares in parallel. Then you might want to have logarithmically many more calls to compute the sum in a bottom up fashion, the last call also computing the square root.

For each program you should add a mechanism of your choice to time just the portion of the code in which parallel processing is done (not reading in the file, you probably want to be using System.nanoTime()). I want you to do experiments with both programs varying the length of the vector you compute the norms of. Look up the number of cores the machine you are experimenting on has, and the number of GPU shader processors it has. If you plot time versus the length of vectors you compute the norms of, does it match what you'd expect in each case if Brent's Theorem were an equality rather than an inequality? Write up your experiments also in Hw2.pdf. This timing should be turned off by default.

Point Breakdown

Written problems (1pt each - graded 0, 1/2 (partial), 1 ( correct)). 5pts
Programs compile and run from the command line as described (1/2pt). Programs meet official CS departmental Java Coding Guidelines (1/2pt) 1pt
ThreadNorm.java uses a `O(log n)` span algorithm to compute the norm using Java Threads(1pt). Program correctly computes the norm of a set of test vectors (1/2pt) 1.5pts
JoclNorm.java uses a `O(log n)` span algorithm to compute the norm using JOCL(1pt). Program correctly computes the norm of a set of test vectors (1/2pt) 1.5pts
Experiments write up 1pt
Total 10pts